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Abstract 

Materialisation precomputes all consequences of a 
set of facts and a datalog program so that queries 
can be evaluated directly (i.e., independently from 
the program). Rewriting optimises materialisation 
for datalog programs with equality by replacing all 
equal constants with a single representative; and in¬ 
cremental maintenance algorithms can efficiently 
update a materialisation for small changes in the 
input facts. Both techniques are critical to practi¬ 
cal applicability of datalog systems; however, we 
are unaware of an approach that combines rewrit¬ 
ing and incremental maintenance. In this paper we 
present the first such combination, and we show 
empirically that it can speed up updates by sev¬ 
eral orders of magnitude compared to using either 
rewriting or incremental maintenance in isolation. 


1 Introduction 


Datalog [Abiteboul et al., 19951 is a declarative, rule-based 
language that can describe (possibly recursive) data depen¬ 
dencies. It is widely used in applications as diverse as en¬ 
terprise data management lAref, 20101 and query answering 
over ontologies in the OWL 2 RL profile 1 Motik et al., 2009) 
extended with SWRL rules fHorrocks et al ., 2004 [ 

Querying the set II 00 (E) of consequences of a set of ex¬ 
plicit facts E and a datalog program II is a key service in data¬ 
log systems. It can be supported by precomputing and storing 
n oo (£’) so that queries can be evaluated directly, without fur¬ 
ther reference to II. Set II 00 ( E ) and the process of computing 
it are called the materialisation of E w.r.t. II. This technique 
is used in the state of the art systems such as Olwgres fetocker 
and Smith, 20081, WebPIE [Urbani et al., 2012), Oracle’s 
RDF store ]Wu et al., 2008) , Gra phDB (formerly OW LIM) 
I Bishop et al., 2011) , and RDFox iMotikef al., 20141. 

Although datalog traditionally employs the unique name 
assumption (UNA), in some applications uniqueness of iden¬ 
tifiers cannot be guaranteed. For example, due to the distri¬ 
bution and the independence of data sources, in the Semantic 
Web different identifies are often used to refer to the same do¬ 
main object. Handling such use cases requires an extension 
of datalog without UNA, in which one can infer equalities 
between constants using a special equality predicate « that 


can occur in facts and rule heads. The semantics of « can be 
captured explicitly using rules that axiomatise ~ as a congru¬ 
ence relation; however, this is known to be inefficient when 
equality is used extensively. Therefore, systems commonly 
use rewriting iBaader and Nipkow, 1998 Nieuwenhuis and 
Rubio, 2001] —an optimisation where equal constants are re¬ 


placed with a canonical representative , and only facts con¬ 
taining such representatives are stored. The benefits of rewrit- 
ing have been well-documented in practice 1 Wu et al., 2008 


Urbani et al., 2012 Bishop et al., 2011 Motik et al., 2015 a| . 

Moreover, datalog applications often need to handle con¬ 
tinuous updates to the set of explicit facts E. Rematerialisa¬ 
tion (i.e., computing the materialisation from scratch) is often 
very costly, so incremental maintenance algorithms are often 
used in practice. Adding facts to E is trivial as one can sim¬ 
ply continue from where the initial materialisation has fin¬ 
ished; hence, given a materialisation n°°(U) of E w.r.t. n 
and a set of facts E~, the main challenge for an incremen¬ 
tal algorithm is to efficiently compute n°°(.E \ E~). Several 
such algorithm s have already been proposed. Truth mainte¬ 
nance systems I Doyle, 1979 de Kleer, 1986||Goasdoue et al., 
2013) track dependencies between facts to efficiently deter¬ 


mine whether a fact has a derivation from E \ E~, so only 
facts for which no such derivations exist are deleted. Such ap¬ 
proaches, however, store large amounts of auxiliary informa¬ 
tion and are thus often un suitab le for d ata-i ntensive applica¬ 
tions. Counting [Nicolas and Yazdanian, 1983) Gupta et al.. 


1993l |Urbani et al., 2013|lGoasdoue et al., 2013) stores with 

each fact F € 1I° C (U) the number of times F has been de¬ 
rived during initial materialisation, and this number is used to 
determine when to delete F\ however, in its basic form count¬ 
ing works only with nonrecursive rules, and a proposed exten¬ 
sion to recursive rules requires multiple counts per fact [De 


wan et al., 19921, which can be costl y. The Delete/Rederive 
(DRed) algorithm I Gupta et al., 1993) handles recursive rules 


with no storage overhead: to delete E~ from E, the algo¬ 
rithm first overdeletes all consequences of E~ in n°°(U) and 
then rederives all facts provable from E \ E~. The Back¬ 
ward/Forward (B/F) algorithm combines backward and for¬ 
ward chaining in a way that outperforms DRed on inputs 
where facts have many alternative derivation s—a co mmon 
scenario in Semantic Web applications iMotik et al., 2015b|. 

Combining rewriting and incremental maintenance is dif¬ 
ficult due to complex interactions between the two tech- 

































































niques: removing E~ from E may entail retracting equali¬ 
ties, which may (partially) invalidate the rewriting and require 
the restoration of rewritten facts (see Section [3]). To the best 
of our knowledge, such a combination has not been consid¬ 
ered in the literature, and practical systems either use rewrit¬ 
ing with rematerialisation, or axiomatise equality and use in¬ 
cremental maintenance; in either case they give up a tech¬ 
nique known to be critical for performance. In this paper we 
present the B/F” algorithm, which combines rewriting with 
B/F: given a set of facts E~, our algorithm efficiently updates 
the materialisation of E w.r.t. II computed using the rewriting 
approach by Motik et ai\ ( |2015a| ). Extensions of datalog with 
equality are nowadays used mainly for querying RDF data 
extended with OWL 2 RL ontologies and SWRL rules, so we 
formalise our algorithm in the framework of RDF; however, 
our approach can easily be adapted to general datalog. 

We have implemented B/F” in the open-source RDFox 
systen^jand have evaluated it on several real-world and syn¬ 
thetic datasets. Our results show that the algorithm indeed 
combines the best of both worlds, as it is often several orders 
of magnitude faster than either rematerialisation with rewrit¬ 
ing, or B/F with axiomatised equality. 


2 Preliminaries 

Datalog. A term is a constant (a, b. A, R, etc.) or a variable 
( x , y, z , etc.). An (RDF) atom has the form (ti, < 2 5 £ 3 ), where 
£i,£ 2 ,£ 3 are terms; an (RDF) fact (also called a triple) is a 
variable-free RDF atom; and a dataset is a finite set of facts. 
A (datalog) rule r is an implication of the form <[l), where 
H, Bi, ..., B n are atoms and each variable occurring in H 
also occurs in some Bp, h(r) := IT is the head atom of r; 
each Bi is a body atom of r; and b(r) is the set of all body 
atoms of r. A (datalog) program is a finite set of rules. 


Rewriting is a well-known optimisation of this approach. For 
7r a mapping of constants to constants and a a constant, fact, 
rule, dataset, or substitution, 7r(a) is the result of replacing 
each constant c in a with 7r(c); such a is normal w.r.t. 7 r 
if 7r(o;) = a; and 7r(a) is the representative of a in 7 r. For 
c a constant, let c 7r := {d \ n (d) = c}. For U a dataset, let 
U n ■■= {(s,p,o) | (7r(s), 7r(p), 7t(o)) £ £/}; and, for F afact, 
let F~ := {/-’}". We assume that all constant are totally 
ordered such that ss is the smallest constant; then, for S a 
nonempty set of constants, min S (resp. max S) is the small¬ 
est (resp. greatest) element of S. Let U be a dataset and let 
E C (U) := {c} U {d | c ss d £ [/}; then, the rewriting of U is 
the pair (7r, I) such that 

1. 7r(c) = min E C (U) for each constant c, and 

2. I = n(U). 


Note that 7r(ss) = «, that the rewriting is unique for U, and 
that IIS’ (U) = U implies J 7r = U. The r-materialisation of 
a dataset E w.r.t. a program II is the rewriting far, /) of the 
dataset J = (II U n~)°°(Ti). Motik el al. (2015a| show how 
to answer queries over J by materialising (n, I) instead of J. 


3 Updating R-Materialisation Incrementally 

Let E and E~ be datasets, let E' = E\ E ~, and let II be a 
program. Moreover, let J (resp. J') be the materialisation of 
E (resp. E') w.r.t. II U E~, and let ( 77 , 1) (resp. (V, /')) be 
the r-materialisation of E (resp. E') w.r.t. II. Given ( 7 r,/), 
II, and E~, the B/F” algorithm computes (n 1 , /') efficiently 
by combining the B/F algorithm by Motik et al. ( 2015 b} for 
incremental maintenance in datalog without equality with the 
r-materialisation algorithm by Motik et al. ( |20l5aj ). We dis¬ 
cuss the intuition in Section [3~i~| and some optimisations in 
Section 3.2 and we formalise the algorithm in Section 3.3 


FI i — Bi A * * • A B n 


(1) 3.1 Intuition 


A substitution is a partial mapping of variables to terms. For 
a a term, atom, rule, or a set of these, voc(a) is the set of all 
constants in a, and ao is the result of applying a substitution 
<t to a. The materialisation 11°° (15) of a dataset E w.r.t. a 
program II is the smallest superset of E containing h(r)er for 
each rule r £ II and substitution o with b(r)u C n oo (£’). 

Equality. The constant owl:sameAs (abbreviated ss) can 
be used to encode equality between constants. For exam¬ 
ple, fact (P..Smith, w, Peter.Smith) states that P..Smith and 
Peter.Smith are one and the same object. Facts of the form 
(s,~,t) are called equalities and, for readability, are abbre¬ 
viated as s«f; note that « £ voc(s « £). Program II~ con¬ 
sisting of rules ( |ftii[ )-( |» 4 ] l axiomatises ss as a congruence re¬ 
lation. If a program II or a dataset E contain ss, systems then 
answer queries in the materialisation of E w.r.t. II U II~. 


(x' 1 ,x 2 ,x 3 ) •£- (x 1 ,x 2 ,x 3 ) A Xi « x[ 

(p 

*1) 

(x 1 ,x' 2 ,x 3 ) £- {x 1 ,x 2 ,x 3 ) Ai 2 ~ x ' 2 

0= 

"2) 

(xi,x 2 , x' 3 ) <- (xi,x 2 ,x 3 ) A x 3 ~ x 3 


s 3 ) 

Xi « Xi ■£- {xi,x 2 , x 3 ), for 1 < i < 3 


-4) 


1 http://www.cs.ox.ac.uk/isg/tools/RDFox/ 


Main Difficulty. An update may lead to the deletion of equal¬ 
ities, which may require adding facts to /. The following 
example program II and dataset E exhibit such behaviour. 

n= {yi ~ t/2 £- <3/1 j R > x ) A (y 2 ,R,x), 

2/i ~ 2/2 <- {x, R, 2 / 1 ) A (x, R, t/ 2 ) } 

E= { (a, R, b), (c, R, d), (a,R,d)} 

I = { (a, R, b), a « a, R « R, b « b, ss ss ss } 

7r = { a 1 —> a, b 1 —>■ b, c£a. d 1 —>• b, Ri->R, 

E- = {( a,R,d)} 

/' = { (a, R, b), a « a, R « R, b « b, ~ ~ 

(c, R, d), c » c, d sa d } 

7r'={ai->a, b 1 —F b, c 1 —>• c, d 1 —>■ d, Ri-> R, «>->•«} 

Relation R is bijective in II, so a « c £ J as both a and c 
have outgoing R-edges to d, and b « d £ J as both b and d 
have incoming R-edges from a. By rewriting, we represent 
each fact (a, R,/3) from J using a single fact (a, R, b), and 
analogously for facts involving «; thus, instead of 14 facts, 
we store just five facts. Assume now that we remove E~ 
from E. In J and J' we ascribe no particular meaning to 
«, so the monotonicity of datalog ensures J C J'; thus, the 
B/F algorithm just needs to delete facts that no longer hold. 































However, awe qL J' and b w d ^ J', so we must update 7 r 
and extend I with the facts from J' that are not represented 
via 7r'. Thus, in our example, /' actually contains I. 

Solution Overview. B/F~ consists of Algorithms [TJjT] that 
follow the same basic idea as B/F; to highlight the differences, 
lines that exist in B/F in a modified form are marked with V, 
and new lines and algorithms are marked withV. 

We initially mark all facts in n(E~) as ‘doubtful’—that 
is, we indicate that their truth might change. Next, for each 
‘doubtful’ fact F, we determine whether F is provable from 
E' and, if not, we identify the immediate consequences of F 
(i.e., the facts in I that can be derived using F) and mark them 
as ‘doubtful’; we know exactly which facts have changed af¬ 
ter processing all ‘doubtful’ facts. To check the provability 
of F, we use backward chaining to identify the facts in I that 
can prove F, and we use forward chaining to actually prove 
F. The latter process also identifies the necessary changes to 
7 r and /, which we apply to ( 7 r, J) in a final step. We next 
describe the components of B/F~ in more detail. 


Procedure saturate() is given a dataset G C I of checked 
facts, and it computes the set L containing each fact F deriv¬ 
able from E' such that each fact in a derivation of F is con¬ 
tained in GE ; thus, C identifies the part of J' to recompute. 
Rather than storing L directly, we adapt the r-materialisation 
algorithm by |Motik et al. ( [2015a 1 and represent L by its 
rewriting (7, P \ P); the role of the two sets P and P is dis¬ 
cussed shortly. Lines |36]]4()| compute the facts in L derivable 
immediately from E. we iterate over each F £ C and each 
G £ P; since we represent L by its rewriting, we add 7 (G) 
to P. The roles of set Y and lines 13714391 will be discussed 


shortly. Lines 41 50 compute the facts in L derivable using 
rules: we consider each fact F in P \ P (lines |4l[]42| , each 
rule r, and each match <7 of F to a body atom oFr (Ime|48]>, 
we evaluate the remaining body atoms of r (line [49]), and we 
derive 7 (h(r)r) for each match r (line |5()| . This basic idea 
is slightly more complicated by rewriting: if F = a ss b, we 
modify 7 so that one constant becomes the representative of 
the other one (line [45]>. As a consequence, facts can become 
‘outdated’ w.r.t. 7, so we keep track of such facts usiim P: if 
F is ‘outdated’, we add F to P and 7 (F) to P (line|44[>: due 
to the latter, P\P eventually contains all ‘up to date’ facts. 
Finally, we apply the reflexivity rules ( |» 4 | l to F (line |47j). 

Procedure saturate() is repeatedly called in B/F~. Set C, 
however, never shrinks between successive calls, so set L 
never shrinks either; hence, at each call we can just continue 
the computation instead of starting ‘from scratch’. A minor 
problem arises if we derive a fact F with F (7 G 7r and so we 
do not add 7 (F) to P, but C is later extended so that F £ CE 
holds. We handle this by maintaining a set Y of ‘delayed’ 
facts: in line [59] we add F to Y if F g C' 7r ; and in line 40 1 


identify each ‘delayed’ fact G £ GE D Y and add 7 (Gjto P. 

Procedure rewrite(a, b) implements rewriting: we update 7 
(line [52| , apply the replacement rules ( |» 1 [ )-( pA{] i to already 
processed facts containing ‘outdated’ constants (line [54]), en¬ 
sure that r is normal w.r.t. 7 (line [56]), and reapply the nor¬ 
malised rules (lines |57]458] >. |Motik et al. | ( |2015a| > discuss in 
detail the issues related to rule updating and reevaluation. 


Procedure checkProvabilityQ takes a fact F £ I and ensures 
that, for each G £ F 71 , we have G £ J' iff 7(G) £ P \ P — 
that is, we know the correct status of each fact that F repre¬ 
sents. To this end, we add F to G (line [22 1 and thus ensure 
that (7, P\P) correctly represents L (line 23 1 . Each fact is 


added to G only once, which guarantees termination of the 
recursion. We then use backward chaining to examine facts 
occurring in proofs of F and recursively check their prov¬ 
ability; we stop at any point during that process if all facts in 
F 77 become provable (lines 24] 28] 31 and [35]). Lines 25 


24 handle the reflexivity rules dwHl: to check provability of 
c « c, we recursively check the provability each fact contain¬ 
ing c. Lines [29]- 3T| handle replacement mles l |~;J -( |^ 3 ) : we 
recursively check the provability of c « c for each constant c 
occurring in F. Finally, lines 32 35 handle the rules in 7r(n): 
we consider each rule r £ 7r(TT) whose head matches F and 
each substitution r that matches the body of r in I, and we 
recursively check the provability of b (r)r. 



atoms of r in J, and we add h(r)r to D for each r such 
that b(r)r C I. Once D is processed, (7, P \ P) reflects the 
changes to ( 7 T, J), which we exploit in Algorithm[2] 


3.2 Optimisations 


Reflexivity. Facts of the form F = c~ c can be expensive for 
backward chaining: due to reflexivity mles in lines 25 


[28] we may end up recursively proving each fact G that men¬ 
tions c. However, F holds trivially if E' contains a fact men¬ 
tioning c, in which case we can consider F proven and avoid 


any recursion. This is implemented in lines 37 39 


Avoiding Redundant Derivations. Assume that T contains a 
rule y\ « y 2 <— ( x , R, y±) A (x, R, y 2 ), and consider a call to 
saturateQ in which facts (a, R, b) and (a, P, d) both end up 
in P. Unless we are careful, in line[50]we might consider sub¬ 
stitution 7”i = { 2:1 — t et] y\ 1 — t 6, y 2 1 —t d} twice: once when 
we match (a, P,6) to (x, R,t/i), and once when we match 
(a, R, d) to (x, R, 2 / 2 )- Such redundant derivations can sub¬ 
stantially degrade performance. 

To solve this problem, set V keeps track of the processed 
subset of P: after we extract a fact F from P, in line [42] 
we transfer F to V\ moreover, in line [49] we evaluate rule 
bodies in V \ P instead of P\P. Now if (a, R, b) is pro¬ 
cessed before (a, P, d), at that point we have (a, R, d) ^ V, 
so t 1 is not returned as a match in line [49] the situation when 
(a, R, d) is processed first is analogous. This, however, does 
not eliminate all repetition: T 2 = {x 1 —>• a, 2/1 1 —>• b, yi i-> 6} 
is still considered when (a, R , b) is matched to either of the 
two body atoms in the rule. Therefore, we annotate (see 
Section |3.3| ) the body atoms of rules so that, whenever F is 
matched to some body atom Bi, no atom Bj preceding //, in 
the body of r can be matched to F. In our example, r 2 is thus 











































considered only when (a, R, b) is matched to (x, R, y±). 

B/F~ avoids redundant derivations in similar vein: set O 


tracks the processed subset of D\ in lines 10 and[14]we match 
the relevant rules in I \ O', and in line 16 we add a fact to O 
once it has been processed. 

Disproved Facts. For each F £ I with F n Cl J' = 0, no fact 
in F" participates in a proof of any fact in Thus, in line [7] 
we collect all such facts in a set S of disproved facts, and in 
lines P 


and 33 we exclude S from backward chaining. 

Singletons. If we encounter F = c ss c in line[9]or[29l where 
c represents only itself (i.e., |c 7I j = 1), then we know that no 
fact in F 71 can derive a new fact using rules ( ]~ i [ >—( p^It] >, and 
so we can avoid considering rules ED-S 

3.3 Formalisation 

We borrow the notation by Motik et al. ( |2015b I to formalise 
B/F~. We recapitulate some definitions, present the pseudo¬ 
code, and formally state the algorithm’s properties. 

Given a dataset X and a fact F, operation X.add(F) adds 
F to X, and operation A'.deleteff’j removes F from X\ both 
return t if X was changed. For iteration, operation A'.next 
returns the next fact from X , or e if no such fact exists. 

An annotated query has the form Q = B^ 1 A • • • A B^ k , 
where each II, is an atom and annotation tx], is either empty 
or equal to Given datasets A' and Y and a substitution 
a, operation A.eval(Q, Y, a) returns a set containing each 
smallest substitution r such that o C t and, for 1 < i < k, 
(i) Bij € A if Dxij is empty or (ii) Bit £ X \ Y if txi* is 
We often write [Z \ W] instead of A, meaning that Q is eval¬ 
uated in the difference of sets Z and W. 

Given a fact F, operation ILmatchHead(F) returns all tu¬ 
ples (r, Q , a) with r £ II a rule of the form (jT]», o a substi¬ 
tution such that Ha = F, and Q = Bi A • • • A B n . More¬ 
over, operation II.matchBody(.F) returns all tuples (r, Q , a) 
with r £ II a rule of the form 0, o a substitution such that 
Bi<7 = F for some 1 < i < n, and Q is defined as 


Q = Bf A • • • A Bf_ x A B 


i +1 


A ■ 


A B„ 


( 2 ) 


Finally, given a mapping 7 of constants to constants, and 
constants d and c, operation 7 .mergelnto(d, c) modifies 7 so 
that 7(e) = c holds for each constant e with 7(e) = d. 

B/F“ consists of Algorithms [I||7] Theorem [l] shows that 
the algorithm incorrect and that, just like the seminaive algo¬ 
rithm I Abiteboul et al., 19951, it does not repeat derivations; 
the proof is given in the appendix. 


4 Evaluation 

We have implemented and evaluated the B/F~ algorithm in 
the open-source RDF data management system RDFox. The 
system and the test data are all available online]^ 

Objectives. Updates can be handled either incrementally or 
by rematerialisation, and equality can be handled either by 
rewriting or by axiomatisation, giving rise to four possible 
approaches to updates. Our first objective was to compare all 
of them to determine their relative strengths and weaknesses. 

‘https://krr-nas.cs.ox.ac.uk/2015/IJCAI/RDFox/index.html 


As E~ increases in size, incremental update becomes 
harder, but rematerialisation becomes easier. Thus, our sec¬ 
ond objective was to investigate the relationship between the 
update size and the performance of the respective approaches. 


Datasets. Equality is often used in OWL ontologies on the 
Semantic Web, so we based our evaluation on several well- 
known synthetic and ‘real’ RDF datasets. 

Each dataset compri ses an OWL ont ology and a set o f ex¬ 
plicit facts E. UOBM \ Ma et al., 2006) extends LUBM IG 


et al., 2005), and we used the data generated for 100 universi¬ 


ties; we did not use LUBM because it does not use ss. Claros 
contains information about cultural artefacts £] DBpedia con¬ 
sists of structured information extracted from Wikipedia^] 
UniProt is a knowledge base about protein sequences rlwe se¬ 
lected a subset of the original (very large) set of facts. Finally, 
OpenCyc is an extensive, manually curated upper ontology |j 
Following Zhou et al. (2013[ ), we converted the ontologies 
into lower (L) and upper bound (U) programs: the former 
is the OWL 2 RL subset of the ontology transformed into 
datalog as described by Grosof et al. (20031, and the latter 
captures all consequences of the ontology using an unsound 
approximation. Upper bound programs are interesting as they 
tend to be ‘hard’. We also manually extended the lower bound 
(LE) of Claros with ‘hard’ rules (e.g., we defined related doc¬ 
uments as pairs of documents that refer to the same topic). 


Update Sets. For each dataset, we randomly selected several 
subsets E~ of E. We considered small updates of 100 and 5k 
facts on all datasets. Moreover, for each dataset we identified 
the ‘equilibrium’ point n at which B/F~ and Remat “ take 
roughly the same time. If n was large, we generated subsets 
E~ with sizes equal to 25%, 50%, 75%, and 100% of n; 
otherwise, we divided n in an ad hoc way. 


Test Setting. We used a Dell server with two 2.60GHz Intel 
Xeon E5-2670 CPUs and 256 GB of RAM running Fedora 
release 20, kernel version 3.17.7-200.fc20.x86_64. 

Test Results. Table |T] summarises our test results. For each 
dataset, we show the numbers of explicit facts (|E/|) and 
rules (|II|), the number of facts in the initial r-materialisation 
(|/~ |), and the time (T~) and the number of derivations (D~) 
used to compute it via rewriting; moreover, we show the lat¬ 
ter three numbers for the initial materialisation computed us¬ 
ing axiomatised equality (|/‘ 4 5 |, T A , and D A ). For each set 
E~ , we show the numbers A| /~ and A | / of deleted facts 
with rewriting and axiomatisation, respectively, as well as the 
times (T) and the number of derivations (D) for each of the 
four update approaches. All times are in seconds. We could 
not complete all axiomatisation tests with Claros-LE as each 
run took about two hours. Due to the upper bound trans¬ 
formation, the r-materialisation of UOBM-100-U contains a 
constant c with |c 7r | = 3930; thus, when « is axiomatised, 
deriving just all equalities involving e" requires 3930 3 = 60 
billion derivations, which causes the initial materialisation to 
last longer than four hours. The number of derivations D in 


3 http://www.clarosnet.org/XDB/ASP/clarosHome/ 

3 http://dbpedia.org/ 

5 http://www.uniprot.org 
t http://www.cyc.com/platform/opencyc 



































Input Variables 

E 

: the explicit facts 

n 

: the datalog program 

(77 I) : the r-materialisation of E w.r.t. II 

P- 

: the facts to delete from E 

Global Temporary Variables 

D 

: the consequences of E~ that might require deletion 

O 

: the processed subset of D 

C 

: the facts whose provability must be checked 

7 

: the mapping recording the changes needed to n 

P 

: the proved facts 

P 

: the proved rewritten facts 

Y 

: the proved facts not in C" 

V 

: the processed subset of P 

S 

: the set of disproved facts 

Algorithm 1 B/F~() 

* 1 

C := D := P := P := Y := O := S := V := 0 

> 2 

initialise 7 as identity and F := II 

3 

for each P £ E~ do 

4 

if P.delete(P) then D.add( 7 r(F)) 

5 

while (P := D.next) 7 ^ e do 

6 

checkProvability(F) 

* 7 

for each G £ G s.t. allDisproved(G) do S'.add(G) 

* 8 

if not allProved(F) then 

> 9 

if P = c « c and \c K \ > 1 then 

>10 

for each G £ I \ O with c £ voc(G) do 

>n 

D.add(G) 

>12 

for each c £ voc(P) do £>.add(c ~ c) 

13 

for each (r, Q, a) £ 7r(II).matchBody(P) do 

14 

for each r £ [I \ 0].eval(Q, {F}, a) do 

15 

P.add(h(r)r) 

16 

O.add(P) 

*17 

propagateChangesQ 

> Algorithm 2 propagateChangesQ 

18 

for each c « c £ G and each d with n(d) = c do 

19 

7r(c0 := 7 (d) 

20 

for each P £ D \ (P \ P) do Fdelete(F) 

21 

for each P £ P \ P do /.add(7r(P)) 

> Algorithm 3 Auxiliary functions 

allProved(F): 

t iff P £ S and 7 (F n ) C (P \ P) 

a 

IDisproved(F): 

tiff 7 (F 7 Qn(P\P) = 0 


Algorithm 4 checkProvability(F) 

22: if not G.add(F) then return 
23: saturateQ 

* 24: if a 11P roved (F) then return 

>25: if F = c « c then 

>26: for each G £ I\S with c £ voc(G) do 

>27: checkProvability(G) 

>28: if allProved(F) then return 

>29: for each c £ voc(F) with c ~ c ^ S and |c 71- 1 > 1 do 

>30: checkProvability(c as c) 

>31: if allProved(F) then return 

32: for each (r, Q,cr) £ 7r(II).matchHead(F) do 
33: for each r £ [I \ 5].eval(Q, 0, a) and G £ b(r)r do 

34: checkProvability(G) 

35: if allProved(F) then return 

Algorithm 5 saturate() 

36: while (F := G.next) ^ e do 
>37: if F = c as c then 

>38: for each d £ voc(F) with n(d) = c do 

>39: P.add(7(d) sa 7(d)) 

*40: for each G £ F 71 ' n (F U Y) do P.add(7(G)) 

41: while (F := Pnext) p £ do 
*42: if F £ P \ (P U V) and V.add(F) then 

>43: G := 7 (P) 

>44: if F ± G then P.add(P) and P.add (G) 

>45: else if P = a as b and a ^ b then rewrite(a, b ) 

>46: else 

*47: for each c £ voc(G) do prove(c ~ c) 

48: for each (r, Q, cr) £ r.matchBody(G) do 

*49: for each r£ [V \ P].eval(Q, {G}, a) do 

*50: prove(h(r)r) 

> Algorithm 6 rewrite(a, b) 

51: c ■= min{a, b} d ■= max{a, b} 

52: 7.mergelnto(d, c) 

53: for each P £ P \ P with d £ voc(P) do 
54: P.add(F) and P.add(7(P)) 

55: for each r £ T with r 7^ 7(r) do 
56: replace r in T with r' '■= 7(r) 

57: for each r£ [V \ P].eval(b(r'), 0, 0) do 

58: prove(h(r')r) 

> Algorithm 7 prove(P) 

59: if7r(P) £ G then P.add(P) else F.add(F) 


Theorem 1. Let (7T, I) be the r-materialisation of a dataset E w.r.t. a program II, and let E~ be a dataset. 

1. Algorithm^terminates, at which point ( 7 r, I) contains the r-materialisation of E \ E~ w.r.t. II. 

2. Each combination of a rule r and a substitution r is considered at most once in line 50 or line 58 but not both. 


3. Each combination of a rule r and a substitution r is considered at most once in line 15. 
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Table 1: Experimental results 


B/F” is the sum of the number of times a fact is determined as 
‘doubtful’ (lines |TT]|T^ and |T5]», checked in backward chain¬ 
ing (lines 27 30 and 34), or derived in forward chaining (line 
[59) ; we use this number to estimate reasoning difficulty inde¬ 
pendently from implementation details. 

Discussion. For updates of 100 facts, B/F” outperforms all 
other approaches, often by orders of magnitude, and in most 
cases it does so even for much larger updates. 

Even when \I A \ — |/”| is ‘small’ (i.e., when not many 
equalities are derived), B/F” outperforms B/F. This seems 
to be mainly because B/F ' 4 ascribes no special meaning to 


II~ and so it does not use the optimisation from lines 37 39 
thus, when trying to prove c ~ c, B/F ' 4 performs backward 
chaining via rules ( ^ 4 ) and so it potentially examines each 
fact containing c. On Claros-L, although 7 4 and |/”| are 
of similar sizes, I A contains one constant c with \c*\= 306, 
which gives rise to 306 3 derivations; this explains the differ¬ 
ence in the performance of B/F” and B/F 4 . 

Remat” outperforms B/F” in cases similar to those de¬ 
scribed by Motik et al. ( 2015b) . For example, in UOBM, re¬ 
lation hasSameHomeTownWith is symmetric and transitive, 
which creates cliques of connected constants; B/F always re¬ 
computes each changed clique, thus repeating most of the 


‘hard’ work. Equality connects constants in cliques, which 
poses similar problems for B/F”. For example, due to the 
constant c with \ c K \ = 3930, deleting 5k facts in UOBM-100- 
Uresults in only 961k (about 1.2% of |/”|) facts being added 
to set C in line [22] but these facts contribute to 73% of the 
derivations from the initial r-materialisation; thus, B/F” re¬ 
peats in Algorithm[5]a substantial portion of the initial work. 


On OpenCyc-L, Remat” already outperforms B/F” on up¬ 
dates of lk triples, which was surprising since the former 
makes more derivations than the latter. Our investigation 
revealed that OpenCyc-L contains about 200 rules of the 
form ( x , type, y ) 4 — ( x , Ri,y) that never fire during forward 
chaining; however, to check provability of (a, type, C), Al¬ 
gorithm [4] considers in line 32 each time each of the 200 
rules. After removing all such ‘idle’ rules manually, B/F” 
and Remat” could update lk tuples in roughly the same time. 
Further analysis revealed that the slowdown in B/F” occurs 
mainly in line 40 the condition is checked for 13.3M facts 
F, and these give rise to 139M facts in i ?7r , each requiring 
an index lookup; the latter number is similar to the number 
of derivations in rematerialisation, which explains the slow¬ 
down. We believe one can check this condition more effi¬ 
ciently by using additional book-keeping. 







































































































































5 Conclusion 

This paper describes what we believe to be the first approach 
to incremental maintenance of datalog materialisation when 
the latter is computed using rewriting—a common optimi¬ 
sation used when programs contain equality. Our algorithm 
proved to be very effective, particularly on small updates. 

In our future work, we shall aim to address the issues we 
identified in Section [4] For example, to optimise the check 
in line [40] we shall investigate ways of keeping track of how 
explicit facts are merged so that we can implement the test 
by iterating over the appropriate subset of E rather than over 
F 71 . Moreover, we believe we can considerably improve the 
efficiency of both the initial materialisation and the incremen¬ 
tal updates by using specialised algorithms for rules that pro¬ 
duce large cliques; hence, we shall identify common classes 
of ‘hard’ rules and then develop such specialised algorithms. 
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A Proof of Theorem [U 


Let II be a program (that ascribes no special meaning to «), and let E be a dataset. A derivation tree for a fact F from E w.r.t. 
II is a finite tree T in which each node t is labelled with a fact F t , and each nonleaf node t is labelled with a rule r t £ II and a 
substitution a t such that the following holds: 


Dl. F £ = F holds for the root e of T; 

D2. F t £ E holds for each leaf node t of T; and 

D3. h(r t )at = F* and b(r t)o t = {F tl ,..., F tri } hold for each nonleaf node f of P with children ti,..., t n . 


The materialisation 11°° (P) of E w.r.t. II is the smallest set containing each fact that has a derivation tree from E w.r.t. II; this 
definition of n°°(F) is equivalent to the one in Section [2] The height of a derivation tree is the length of its longest branch; 
moreover, the height of a fact F £ 11°° (/T) w.r.t. E and II is the minimum height of a derivation tree for F from E w.r.t. II. 

In the rest of this paper, we make the following assumption (*): no derivation tree contains a node t where r t is ED and 
<Jt{x i) = afx'fj , or r t is ED and er t (x 2 ) = <x t (x' 2 ), or r t is ED and o t (x 3 ) = cr t (x' 3 ) . This is w.l.o.g. because, for each such 
t, we have F t = F, for t\ the first child of /; hence, we can always remove such t from the derivation tree. 

Next, we recapitulate Theorem[l]and present its proof, which we split into several claims. 

Theorem 1. Let (tr. I ) be the r-materialisation of a dataset E w.r.t. a program II, and let E~ be a dataset. 

1. Algorithm^terminates, at which point (tr, I) contains the r-materialisation of E \ E~ w.r.t. II. 

2. Each combination of a rule r and a substitution t is considered at most once in line^50^or //«e[5Sj but not both. 

3. Each combination of a rule r and a substitution r is considered at most once in line\15\ 


In the rest of this section, we fix a datalog program II and datasets E and E~. Let ( 7 r, I) be the r-materialisation of E w.r.t. II; 
let J := (II U n R; ) 00 (£'); let E' ■■= E\ E~\ let ft' , I') be the r-materialisation of E' w.r.t. II; and let J' := (II U ns-) 00 ^'). 
By the monotonicity of datalog, we clearly have J' C J. 

We next show that Algorithm[5]essentially captures the r-materialisation algorithm by Motik et al. ( |2015a| . 

Claim 1. Let P and P be as obtained after a call to Algorithm |j]/« line 23 let K := {d ss d \ d £ voc {E)}, and let L be the 
set containing precisely each fact F that has a derivation T from K U E w.r.t. II U II~ in which F t £ G* holds for each node 
t ofT. Then, the following properties hold: 


1 . 

2 . 

3. 


7 (c) = min E C {L) for each constant c; 

P\P = 7 (L); and 

each combination of a rule r and a substitution r is considered at most once in line 50 or line 58 but not both. 


Proof (Sketch). Algorithm[5]is a variant of the r-materialisation algorithm by |Motik et q/.| ( j2015a| l, so properties 1-3 hold by a 
straightforward modification of the correctness proof of that algorithm. This proof is quite lengthy so, for the sake of brevity, 
we just summarise the differences. 


Lines 37-39 ensure 7 (C 71 ’ (T K ) CP\P, and line |40| ensures "f(C n IT E') C P\P; hence, G* T (K U E') plays the 
same role that explicit facts play in the algorithm by Motik et al. ( 2015a| . 

Let F be an arbitrary fact considered in line [41] To ensure property 4 of Claim [I] the algorithm by [Motik et al. | ( |2015a| ) 
uses slightly different annotated queries to apply the rules in lines |48]|49] only to facts extracted before F. In contrast, 
Algorithm[7]keeps track of previously processed facts in set V, but this has exactly the same effect. 

All derivations of a fact in line [47][50] or [58] are handled by Algorithm]?] which, for each F, checks whether n(F) £ C; 
this is equivalent to checking F £ C'L If the latter holds, then F is added to P, and otherwise F is added to Y. If in a 
subsequent invocation of Algorithm [5] set C is extended such that n(F) £ C suddenly holds, then 7 (F) is added to P in 
line [40] This, however, does not change the algorithm in any substantial way. □ 


The following claim follows immediately from the definitions in Algorithm[3] 

Claim 2. The following properties hold for an arbitrary fact F normal w.r.t. tt: 

1. allProved(P) = t if and only if F <fS and F* C (P \ P) 7 ; and 

2. allDisproved(F) = t if and only if F n T (P \ P ) 7 = 0. 

We next show that sets C, P, P, S, and 7 always satisfy an important property. 

Claim 3. Assume that Algorithm^is applied to some fact F, mapping 7 , and sets S, C, P, and P where S is normal w.r.t. 7 r 
and S n T J' = 0, and assume that all of these satisfy the following property: 

(0) for each G £ C, either G~ C (P \ P ) 7 or, for each fact H £ G~, each derivation tree T for H from E' w.r.t. 

II U n~, and each child ti of the root ofT, we have it(F ti ) £ C. 




























Then, property (O) remains preserved after the invocation of Algorithm^ 


Proof. The proof is by induction on recursion depth of Algorithm [4] at which a fact is added to G. For the induction base, (0) 
remains preserved if the algorithm returns in line [22] 
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holds for each fact G £ C different from F after a recursive call in line |27[|3Q| 

then property 1 of Claim [ 2 ] implies F 77 C (P \ P) 7 , so property (0) remains 


For the induction step, assume that 
If the algorithm returns in line 

preserved. Otherwise, consider an arbitrary fact H £ P and an arbitrary derivation tree T for H from E' w.r.t. II U II 
Let ti,... ,t n be the children (if any exist) of the root e of T; since J contains each fact labelling a node of T, we have 
{Ft., ...,F ( JC J'C J, Now let Fi = 7r(F t J; by the definition of r-materialisation, we have {Pi,... ,F n } C /. Moreover, 
for each 1 < i < n, we have Fi £ J' and S 77 (T J' = 0, which imply F, f S n \ moreover, S is normal w.r.t. n, so Fi ^ S as 
well. Finally, we clearly have 7r(r £ <r £ ) = 7r(r £ )7r(cr £ ), and so h(7r(r £ ))7r((7 £ ) = P and b(7r(r £ ))7r(cr £ ) = {Fi, ..., F n } C I\S. 
We next consider the forms of r £ . 




Assume r £ is of the form ( |»,|| l 
we have Pi £ C, as required. 


, so n = 1. Fact Pi is eventually considered in line 26 so, due to the recursive call in line 27 






Assume r £ is of the form ( |»i| i-( |^] |; thus, n = 2, p = P, and P 2 = c ~ c foi' some constant c. Fact Pi = P is added to 
C in line [22] Moreover, by assumption (*) on the shape of T, we have P 2 = s ss t with s / t; since 7r(s) = 7r(t) = c, we 
have |c 7r | > 1. Thus, due to the recursive call in Iinc |30[ we have P 2 £ C, as required. 


Assume r £ £ II. Then, 7r(r £ ) £ 7r(II), so 7r(r £ ) and 7r(cr e ) are eventually considered in lines 32 and 33 hence, due to the 
recursive call in line 34 we have Fi £ C for each 1 < % < n, as required. □ 


Calls in line [^ensure another property on G, P, P, and S. 

Claim 4. The following properties hold after each line of Algorithm^ 
1. property (()) is satisfied; 


2. (P \ P) 7 =C n n J'; 

3. 7 (c) = min E c (G 7r (T J') for each constant c; and 

4. s n nJ' = 0. 

5. For each fact F £ O, we have F 77 f- J'. 

6. DCC. 


Proof. The proof is by induction on the number of iterations of the loop in lines |5ffT6[ For the induction base, we have 
S' = C = P = O = 0 in line [T] so properties 1-5 clearly hold initially. For the induction step, assume that all properties hold 
before line[6] Due to property 4 and Claim[3] property 1 remains preserved after line[6] hence, we next consider properties 2-6. 

(Property 2) Let K and L be as stated in Claim[lj note that property 2 of Claim[l]is equivalent to (P \ P ) 7 = L. We first 
show (P \ P ) 7 CF T n J'. Since K C J', we clearly have J' = (II U n~)°°(Ar U E'). Moreover, for each F £ (P \ P ) 7 we 
have F £ L, so by the definition of L there exists a derivation tree T for F from K U E' w.r.t. P U E~ such that F t £ C 77 holds 
for each node t of T; but then, we clearly have F £ C" (T J'. We next prove G 77 D/C (P \ P ) 7 by induction on the height 
h of a fact F £ C 77 (T J' w.r.t. E' and II U II~. 

• If h = 0, then F £ E'\ since F £ C 77 , by the definition of L we have F £ L; but then, F £ (P \ P) 7 as well. 

• Assume that the claim holds for each fact in C 7r D J' whose height w.r.t. If and II U II~ is at most h, and consider an 
arbitrary fact F £ C" D J' with height h + 1; let T be the corresponding derivation tree for F. Moreover, assume that 
F (P \ P) 7 ; then, F £ G* implies 7 r(F) £ C\ hence, property (0) ensures that, for each child L of the root of T, we 
have 7t(F ( .) £ C, which is equivalent to F t . £ C 71 ’. Now the height of each F t . w.r.t. E' and II U II~ is at most h so, by 
the induction assumption, we have F f . £ (P\ P) 7 = L. The latter ensures that, for each F t ., there exists a derivation tree 
Ti in which each node is labelled by a fact contained in G". Let T' be the derivation tree in which the root e is labelled 
with the same fact, rule, and substitution as in T, and each Ti is a subtree of e. Clearly, T' is a derivation tree for F 
from E' w.r.t. II U 1I~ in which each node is labelled by a fact contained in C~\ thus, by the definition of L, we have 
F £ L = (P \ P) 7 , as required. 


(Property 3) This property follows directly from property 1 of Claim[l]and property 2 of Claim[4] 

(Property 4) Assume that some fact G is added to S in line [7] Then allDisproved(G) = t, which by property 2 of Clai m [2 
implies G 7r fl (P \ P) 7 = 0. Property 2 of Claim Cl holds at this point, so we have G" fi CG fl ./' = 0. Finally, lines |6| and [22 
ensure G £ C, so we have G 71 C G"; thus, G" fl T 0, and so adding G to S preserves property 4. 

(Property 5) Assume that some fact F is added to O in line 16 Then allProved(P) = f, which by property 1 of Claim[2] 
implies F £ S or F 71 % (P\ P) 7 . In the former case, F 77 <2 J' holds directly from property 4. In the latter case, property 2 of 















Claim[4]holds at this point, so we have F~ G 77 D J'\ moreover, lines[ 6 ]and 22 ensure F £ C, which implies F~ C C 77 ; this, 
in turn, implies F 77 % J'. Consequently, adding F to O preserves property 5. 

(Property 6 ) Each fact F extracted from D in line[5]is passed in line[ 6 ]to Algorithm]?] which in turn ensures that F is added 
to C in line | 22 ] □ 


We next show that set D contains each fact that needs to be deleted, and each fact that contains a constant whose representative 
changes as a result of the update. 


Claim 5. For each fact F £ J\J', the following two properties hold in line 17 

1. 7 t(F) £ D, and 

2. if F = s ~ t with s f t, then D contains each fact G £ I such that 7 r(s) £ voc(G) and G 77 % J'. 


Proof Consider an arbitrary fact F £ J\J'. 

(Property 1) We prove the claim by induction on the height h of F w.r.t. E and II U II~; the notion of the height of F is 
correctly defined because F £ J. For the induction base, assume h = 0; now F £ J implies F £ E\ moreover, F (f_ J' implies 
F f E'\ thus, F £ E~, and so tt(F) is added to D in lines[3]-|4] For the induction step, assume that the claim holds for each fact 
in J\J' whose height w.r.t. E and II U !!« is at most h, and assume that the height of F w.r.t. E and II U II~ is h + 1. Fet T be 
a corresponding derivation tree for F from E w.r.t. II U 1I~; let t -\..... t n be the children of the root e of T; and let F, = 7r(F ti ) 
for each 1 < i < n. Moreover, let N contain precisely each Fj, 1 < i < n, such that F, £ I) and Ff f J. Since F (f_ J', some 
j with 1 < j < n exists such that F t f J'; moreover, T is a derivation tree for F from E w.r.t. II U E~, so F, ; £ J and the 
height of F t . is at most h; but then, we have 7r(F tj ) = Fj £ D by the induction hypothesis, and so we also have Fj £ N —that 
is, N f 0. Each fact in D is eventually considered in line[5] thus, let F' be the fact from N that is consider first. At that point, 
we have O IT N = 0 because facts are added to added to O in linefl6]only after they have been considered; hence, Fi £ I \ 0 
holds at this point for each 1 < i < n. Furthermore, F' £ I) C Glmplies (F r )~ C G 77 ; but then, (F')~ <2 J' and property 2 
of Cl aim [T] imply (F') 77 f (F \ F) 7 ; thus, property 1 of Claim^ensures we have allProved(F') = f and so the check in line [8] 
passes. We next consider the possible forms of the rule r e . 


Assume that r e is ( |fti 1 | H 03] >. Then, we clearly have n(F) = F\ ; fact F, 2 is of the form F, 2 = s ss t with s / f and 
c = 7r(s) = 7 r(f); and c £ voc(Fi). We have two possible ways to choose F'. If F' = F \, then tt(F) = F\ = F' £ D 
holds. If F' = F- 2 , then s f t. by assumption (*) on the shape of T, so c~| > 1 and the check in li ne[9| passes; furthermore, 
due to Fi £ I \ O, we eventually consider fact G = F| = 7 r(F) in line 10 and add it to D in linefTW 

Assume that r f is Then, F is of the form s ~ s so 7r(F) = c ~ c for c = 7r(s); clearly, we have c £ voc(F') and 


F' = Fi. But then, 7r(F) is added to D in line 12 

Assume that r e £ II. We clearly have 7r(r e <r £ ) = 7r(r e )7r(cr e ); therefore, we have 7 r(F) = 7r(h(r e cr e )) = h(7r(r e ))7r(tT e ) 
and 7r(b(r e cr e )) = {Fi,..., F n } = b(7r(r e ))7r(cr e ) C I \ O. Moreover, we clearly have 7r(r e ) £ 7r(II). Finally, let i be the 
smallest integer with 1 < i < n such that Fj = F\ and let Q be annotated query ([2]) obtained from 7r(r e ) for that i; clearly, 
the way in which we chose i ensures Fj f F' for each j with 1 < j < i. All of these observations ensure together that 
(7r(r e ), Q, a)} £ 7r(n).matchBody(F') is considered in line 13 and that 7 r(cr e ) is considered in line 
7 r(F) is added to D in line [15] 
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(Property 2) Assume that F is of the form F = s ss t with s t, let c = 7r(s) = 7 r(f), and let F' = 7 r(F). Property 1 of 
this claim ensures F' = c~c£DCC, and so we have (F') 77 C C 77 ; but then, together with F f J', property 2 of Claim Bl 
ensures (F') 77 (P \ F) 7 ; finally, property 1 of Claim[2]ensures allProved(F') = f. Fact F' is eventually processed in lineQ^ 
and by the previous discussion the check in line [8] passes. Moreover, s f t implies |c 77 1 > 1, so the check in line[9]passes as 
well. Now consider an arbitrary fact G £ I such that c £ voc(G) and G 77 2 J'\ property 5 of Claim [4] ensures G f O, and 
therefore G is added to D in line[TT] □ 


We next show that Algorithm[l]correctly updates I to V. 

Claim 6. A Igorithm^ updates set I to I'. 

Proof. Property 6 of Claim [4] and property 1 of Claim [5] clearly ensure that ([3]» holds. Furthermore, property 2 of Claim [4] 
clearly ensures that 0 holds. 

J \ J' C D 77 C G 77 (3) 

(F \ F ) 7 C J' C J (4) 

For convenience we recapitulate the definitions of 7 r(c), 7 r'(c), and 7 (c); note that (]7]» follows immediately from properties 2 
and 3 of Claim [4] Finally, 0. 0, and ([7]) clearly imply 0. 

7 r(c) = minE c (J) (5) 









7r'(c) = min E c ( J') 

(6) 

7(c) = min E C ((P\ P) 7 ) 

(7) 

^'((P\P) 7 ) = ^(P\P) 

(8) 


Before proceeding, we prove several useful properties. Consider an arbitrary constant c with ^ r(c) = c; by 0 and 0 - 0 . 
we clearly have n'(c) = c and 7 (c) = c. Thus, for each fact P with n(F) = F, we have tt'(F) = F and 7 (P) = F, which 
ensures the following properties: 

F £ I iff F £ J, F £ I' iff F £ J', F £ (P\ P) 7 iff F £ P\P, 

F £ D iff F £ D n , and F £ C iff F £ C 77 . K ’ 
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We next show that lines 
F = c « c. Set P 77 clearly contains each triple of the form d 


19 update it to n'. To this end, consider arbitrary constants c and d with tt(d) = c, and let 

e £ J, which, together with 0, implies 


E d (F n n(P\ P) 7 ) = E d ((P \ P) 7 ), E d (F n n J 1 ) = E d (J r ), and E d (F* n J) = E d (J). (10) 


We now consider two possible cases. 

• Assume that F £ C. Thus, F 77 C G 77 holds, so property 2 of Claim[i]ensures P 77 n (P \ P ) 7 = P 77 n J' = V. But then, 
( fT0l > imply E d (V) = E d (J') = E d ((P \ P) 7 ). Finally, 0 and 0 imply n'(d) = 7 (d). 

• Assume that F £ C. We thus have P 77 n G 77 = 0; but then, J \ J' C G 77 implies P 77 n ( J \ J') = 0, which then implies 
P 77 H J = P 77 n J'. Finally, 0. 0, and ([TO]) together imply n'(d) = ir(d). 

We next prove / \ P = D \ (P \ P) and hence show that line [ 20 ] correctly deletes the relevant facts. To this end, we next 
consider each side of the inclusion. 

• Assume that F £ I\I'. Then F £ I implies 7 r(P) = P, so by 0 we have F £ J \ J 1 . By 0 we have P £ D 77 C G 77 , 
and by 0 we have F £ D C C. Moreover, P ./' and property 2 of Claim[4]imply F g (P\ P ) 1 , which by 0 implies 
P ^ P\P. Consequently, we have P £ D\(P\P). 

• Assume that F £ D\(P \ P). Then D C I implies F £ I, so 7 r(P) = P. Also, F ^ P \ P and 0 imply P ^ (P \ P) 7 . 
But then, property 2 of Claim[4]ensures P ^ C 77 (T J'. Due to D C C and 0, we have P £ C 77 ; thus, P ^ J', so by 0 
we have F qL V. Consequently, we have P £ I \ F. 


We finally prove that P = [/ \ (/ \ /')] U 7 r'(P \ P) and hence show that line 21 correctly adds the relevant facts; please 


remember that, due to updates in lines [18] - [T9| mapping 7r actually contains 7r' in line 21 

• Assume that P £ [I \ (/ \ /')] U 7r'(P \ P). We consider two cases. 

- Assume that P £ / \ (/ \ P). Thus, F £ I and P ^ I \ P; but then, we have P £ P, as required. 

- Assume that P £ n'(P\P). Then, some G £ (P \ P) 7 exists such that = F. By property 2 of Claim |d] we 
have G £ but then, we have tt'(G) = P £ P, as required. 

• Assume that P £ /' and P ^ I \ (/ \ P). Thus, P ^ /, but clearly F £ J' C J. Due to the latter, some G £ I exists 
such that 7r(P) = G; clearly, F ^ G and G 77 2 «P Since G £ /, we have 7 t(G) = G; thus, by 0 we have 7 t'(G) = G. 
Moreover, P £ P implies 7 r'(P) = P. Consequently, distinct constants a £ voc(P) and 6 £ voc(G) exist such that 
a ~ 6 £ J\J'; but then, property 2 of Claim [ 5 ] and G* % J ensure that G £ D C G C G 77 , which ensures P £ G 77 . 
Since P £ J', by property 2 of Claim[4]we have P £ (P \ P) 7 ; but then, by 0 we have P £ 7 r'(P \ P), as required. □ 

We next show that Algorithm [I] does not repeat derivations. 

Claim 7. Each combination of a rule r and a substitution t is considered at most once in line\15\ 


Proof. Assume that a rule r £ II and substitution r exist that are considered in line 15 twice, when (not necessarily distinct) 
facts P and P' are extracted from I). Moreover, let Pj and If/ be the body atoms of r that r matches to P and F '—that is, 
P = P,;r and F' = Ij,/t. Finally, let Q' be the annotated query considered in line 13 when atom D,/ of r is matched to F'. We 
have the following possibilities. 

• Assume that P = F'. Then, If and B / must be distinct, so w.l.o.g. assume that i < i!. But then, query Of contains atom 

when evaluating Q'. 
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Bf, so r cannot be returned in line 

Assume that P/P' and that, w.l.o.g. P is extracted from D before F'. Then, we have P £ O due to line 16 and therefore 
we have P I \0; consequently, r cannot be returned in line 14 when evaluating (f. □ 














